A Pipeline Approach to Chinese Personal Name Disambiguation
نویسندگان
چکیده
In this paper, we describe our system for Chinese personal name disambiguation task in the first CIPSSIGHAN joint conference on Chinese Language Processing(CLP2010). We use a pipeline approach, in which preprocessing, unrelated documents discarding, Chinese personal name extension and document clustering are performed separately. Chinese personal name extension is the most important part of the system. It uses two additional dictionaries to extract full personal names in Chinese text. And then document clustering is performed under different personal names. Experimental results show that our system can achieve good performances.
منابع مشابه
The Chinese Persons Name Diambiguation Evaluation: Exploration of Personal Name Disambiguation in Chinese News
Personal name disambiguation becomes hot as it provides a way to incorporate semantic understanding into information retrieval. In this campaign, we explore Chinese personal name disambiguation in news. In order to examine how well disambiguation technologies work, we concentrate on news articles, which is well-formatted and whose genre is well-studied. We then design a diagnosis test to explor...
متن کاملClustering Technique in Multi-Document Personal Name Disambiguation
Focusing on multi-document personal name disambiguation, this paper develops an agglomerative clustering approach to resolving this problem. We start from an analysis of pointwise mutual information between feature and the ambiguous name, which brings about a novel weight computing method for feature in clustering. Then a trade-off measure between within-cluster compactness and among-cluster se...
متن کاملChinese Personal Name Disambiguation: Technical Report of Natural Language Processing Lab of Xiamen University
This report presents the work of our group in the Chinese personal name disambiguation workshop. We propose a system which uses a HAC algorithm to cluster the mentions referring to the same person with features extracted from the documents.
متن کاملChinese Personal Name Disambiguation Based on Person Modeling
This document presents the bakeoff results of Chinese personal name in the First CIPS-SIGHAN Joint Conference on Chinese Language Processing. The authors introduce the frame of person disambiguation system LJPD, which uses a new person model. LJPD was built in short time, and it is not given enough training and adjustment. Evaluation on LJPD shows that the precision is competitive, but the reca...
متن کاملA Multi-stage Clustering Framework for Chinese Personal Name Disambiguation
This paper presents our systems for the participation of Chinese Personal Name Disambiguation task in the CIPSSIGHAN 2010. We submitted two different systems for this task, and both of them all achieve the best performance. This paper introduces the multi-stage clustering framework and some key techniques used in our systems, and demonstrates experimental results on evaluation data. Finally, we...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010